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Exploiting inter task dependencies for dynamic load balancing 

-Becker, W.; Waldmann, G. 

Inst, for Parallel & Distributed High Performance Syst., Stuttgart Univ., Germany 

This paper appears in: High Performance Distributed Computing, 1994,, Proceedings of the Th 

International Symposium on 

Onpage(s): 157-165 
2-5 Aug. 1994 
1994 

ISBN: 0-8186-6395-2 

IEEE Catalog Number: 94TH0667-6 

Number of Pages: xiii+304 

References Cited: 21 

INSPEC Accession Number: 4778644 



Abstract: 

The major goal of dynamic load balancing is not primarily to equalize the load on the nodes of a paral 
system, but to optimize the average response time of single requests or the throughput of all applicatio 
system. Therefore it is often necessary not only to keep all processors busy and all processor ready qu 
within the same range, but to avoid delays and inefficient computations caused by foreseeable but igno 
and precedence constraints between related tasks. We present concepts for dynamic consideration of i 
dependencies within small groups of tasks and evaluate them observing real applications in a load bal 
environment on a network of workstations. The concepts are developed from scheduling of single task 
towards heterogeneous multi user operation scenarios. 
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Scan line graphics generation on the massively parallel processor 

- Dorband, J.E, 

NASA, Greenbelt, ME>, USA 

This paper appears in: Frontiers of Massively Parallel Computation, 1988. Proceedings., 2nd S 
the Frontiers of 

Onpage(s): 327 - 329 
10-12 Oct. 1988 
1989 

ISBN: 0-8186-5892-4 

Number of Pages: xxxxi+736 

References Cited: 3 

INSPEC Accession Number: 3532276 



Abstract: 

The author describes the implementation of a scan line graphics generation algorithm on the massiyel 
processor MPP. Pixels are computed in parallel and their results are applied to the Z buffer in large gr 
perform pixel value calculations, facilitate load balancing across the processors and apply the results t 
efficiently in parallel requires special virtue routing (sort computation) techniques developed by the a 
especially for use on single-instruction multiple-data (SIMD) architectures. This involves a preproces 
step which determines how much of the sort is necessary to provide sufficient contiguous space to dup 
data. Once this has been determined a sort is used to compress the data which can be terminated early 
information derived by the scout step. This then gives the ability to keep as many processors as possib 
reasonable efficiency. 
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FPAR: 

PURPOSE: To attain the parallel processing for transfer of data and to increase 
the processing speed of a data flow processor, by using a detecting function for 
data transfer to plural bus groups as well as a simultaneous access function for 
memory addresses of instructions. 



FPAR: 

CONSTITUTION: The instructions stored in a memory part M are sent to the 1st bus 
groups Bus l-l∼ Bus 3-l via an interface part Di together with data and then 
undergo the airthmetic processing through corresponding arithmetic parts 
fl∼f3 via an interface R. The results of the arithmetic processing are 
delivered after the idle state is decided at an arithmetic output interface part 
D for the 2nd bus groups Bus l-2∼ Bus3 -2 . Then an interface part A of the part 
M decodes the address information to which the next execution instruction is 
designated for the data on the 2nd bus groups when these groups are filled. Then 
an access is given to an instruction corresponding to the decoded information 
from the part M . This instruction is sent to the 1st bus together with the next 
arithmetic object data through the part Di . Then the transfer of data is 
processed in parallel. This can accelerate the processing speed of a data flow 
processor . 
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TITLE: Storage Protection Mechanism for Processor. March 198 0. 



TBTX: 

3p. This article describes a mechanism for checking for possible software errors 
that could disrupt the integrity of the main storage in a processor, thus 
providing some degree of storage protection. The Storage Protection Device 
utilizes circuitry shown in Fig. 1 and is adaptable to a system without affecting 
the hardware or data flow of a processor having a Storage Relocation Translator 
such as that described in U.S. Patents 4,037,215 and 4,050,094. It serves to 
control processor channel functions without being part of the channel . - Storage 
access is controlled through bits contained in the storage relocation translator 
segmentation registers. The segmentation register bit format is shown in Fig. 2. 
- Bit S (Summary)- This bit if set to "1" indicates that the programmer has set 
his address to point to a location greater than the maximum storage defined for 
that processor. Therefore, the hardware will prevent this storage access and flag 
the address as an Invalid Storage Address (ISA) . - Bits A, 0, 1, 2, 3, 4 -These 
are the high-order physical address bits of a 17 bit address. - Bit V (Valid) - 
This bit must be set to a "1". If set to zero the address associated with that 
segmentation register is considered invalid. If an attempt is made to use this 
address, the hardware will prevent this storage access and flag the address as an 
ISA. - Bit R (Read Only) -- If set to a "1", it serves to block storage writes to 
the location in storage specified by that segmentation register. An attempt to 
write to this location with the. read-only bit "on" will result in a Protect Check 
in the processor status word (PSW) and an end to the storage access. - A storage 
cycle is initiated by a Set Storage Address Register instruction. This is 
followed by a signal from the channel to storage, called Storage Gate A, which 
indicates that a storage address is on the Storage Address Bus . This is followed 
by another signal from the channel to storage called Storage Gate E, which gates 
data to the Data Bus . When Storage Gate A and Storage Gate E are received by 
storage, a signal from storage to the channel is generated called Synchronous 
Storage Return, meaning that storage is capable of operating at synchronous 
speed. Should this cycle not be completed within a predetermined time interval 
such as 6.4 microseconds, the channel will time out and the processor will 
generate an ISA check condition. - The circuitry of Fig. 1 checks for the invalid 
conditions aforementioned and determines whether Storage Gate B will be blocked 
or not. If Storage Gate B is blocked, the storage access cycle will not be 
complete. Synchronous Storage Return would never be generated, and the channel 
would time out. - The logic circuitry in Fig. 1 is active only when the storage 
relocation translator is enabled, as evidenced by active control lines 1 and 2 . 
The upper group of circuits 3-5 provide the checking operation when the storage 
transfer is either between the processor and main storage or between the channel 
and main storage when operating in a direct program control (DPC) mode. Control 
line 6 disables the Storage Gate B line to gate 7 when segmentation register bit 
V indicates a not-valid condition. Control line 8 will disable the gate 7 when 
bit S indicates a programming error when initially loading the segmentation 
registers. The output of AND circuit 5 will disable the gate 7 when the processor 
is doing a write to main storage and bit R indicates a read-only status. - The 
lower group of circuits 10-13 perform a similar checking operation for 
segmentation register bits S and V when cycle steal data transfers are being 
performed between the channel and the main storage unit . Gate 7 is disabled if 
bit S has a value of 1 (programming error) or bit V has a value of 0 (not valid) . 
Circuits 12 and 13 provide a latch circuit, line 14 being the set Input and line 
15 being the reset input. 
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